First, load in the pre-written group and word lists to be used in analyses:
The agentic and communal lists were borrowed from https://onlinelibrary.wiley.com/doi/10.1002/ejsp.2561; here are some examples:
## [1] "able" "accomplish" "accomplishment" "accuracy" "accurate"
## [6] "achieve"
## [1] "accept" "acceptable" "acceptance" "accommodate" "accommodation"
## [6] "accompany"
The group word lists were taken from https://pubmed.ncbi.nlm.nih.gov/35787033/, as well as the trait list:
## [1] "men" "man" "male" "males" "masculine" "masculinity"
## [1] "women" "woman" "female" "females" "feminine" "femininity"
## [1] "able" "abrupt" "absentminded" "abusive" "accommodating"
## [6] "accurate"
The role titles were scraped off this site: https://theodora.com/dot_index.html, a 1971 survey on role titles. They were merged with one-word titles from ONET, the modern equivalent: https://www.onetonline.org/find/all. They were merged with the chore list to represent unpaid labor.
## [1] "referee" "wirer" "stoneworker" "doweler" "clerk" "boiler"
The workhorse function, which iterates over each decade, computing the MAC score between each word and each group, then finds the Pearson correlation of the resulting lists (demonstrated visually later)
An example of how the mac function works (using engall 1990); here we compute the mean average correlation of each word in the first list to the list of animals. It makes sense that the animals in the first list had the highest mac score.
## elephant horse tiger happy weird car
## 0.29825993 0.26152604 0.32429946 0.01836855 0.12177856 0.09239695
You can compute the cosine similarity of any two words by replacing the lists with single words.
## happy
## 0.4395598
We can plot the mac scores for two different groups against each other like so:
The titles of the plots contain the Pearson coefficient, which is what we will use to measure the similarity of the two groups.
Noticing that the 1810 coha plot had an odd correlation, let’s check the proportion of gender words that were available, as this could be skewing the slope.
## men women human nonhuman year
## 1810 0.4242424 0.3714286 0.5000000 0.1111111 1810
## 1820 0.5757576 0.5714286 0.6428571 0.4444444 1820
## 1830 0.6969697 0.6571429 0.6428571 0.6111111 1830
## 1840 0.6666667 0.6857143 0.6428571 0.6111111 1840
## 1850 0.6969697 0.7142857 0.6428571 0.6666667 1850
## 1860 0.6969697 0.7142857 0.7142857 0.6666667 1860
## 1870 0.6969697 0.6857143 0.8571429 0.6111111 1870
## 1880 0.7272727 0.7142857 0.9285714 0.7222222 1880
## 1890 0.7575758 0.6857143 0.9285714 0.6111111 1890
## 1900 0.7272727 0.7428571 0.9285714 0.7222222 1900
## 1910 0.7878788 0.7428571 0.9285714 0.6111111 1910
## 1920 0.8181818 0.6857143 1.0000000 0.7222222 1920
## 1930 0.8181818 0.7142857 0.9285714 0.7222222 1930
## 1940 0.8181818 0.7142857 0.9285714 0.7222222 1940
## 1950 0.7575758 0.7142857 1.0000000 0.8333333 1950
## 1960 0.7575758 0.6857143 1.0000000 0.8888889 1960
## 1970 0.7878788 0.6571429 1.0000000 0.8888889 1970
## 1980 0.7272727 0.6857143 1.0000000 0.8333333 1980
## 1990 0.6969697 0.7142857 1.0000000 0.8888889 1990
## 2000 0.7575758 0.7142857 1.0000000 0.8333333 2000
Clearly many fewer words were available in that first decade; let’s check for statistical outliers.
## Decade Value
## 1 1810 0.4242424
## Decade Value
## 1 1810 0.3714286
Repeat for engall:
## men women human nonhuman year
## 1800 0.7272727 0.6571429 0.6428571 0.5555556 1800
## 1810 0.7575758 0.7142857 0.7142857 0.6111111 1810
## 1820 0.7878788 0.8000000 0.7142857 0.6111111 1820
## 1830 0.7878788 0.8000000 0.7142857 0.6111111 1830
## 1840 0.7878788 0.8285714 0.7142857 0.7777778 1840
## 1850 0.7878788 0.8285714 0.8571429 0.7777778 1850
## 1860 0.7878788 0.8285714 0.8571429 0.7777778 1860
## 1870 0.7878788 0.8000000 0.9285714 0.8333333 1870
## 1880 0.8484848 0.8857143 0.9285714 0.8333333 1880
## 1890 0.8484848 0.9142857 0.9285714 0.8333333 1890
## 1900 0.8484848 0.9142857 0.9285714 0.8333333 1900
## 1910 0.8787879 0.8571429 1.0000000 0.8333333 1910
## 1920 0.8484848 0.8857143 1.0000000 0.8333333 1920
## 1930 0.8484848 0.8285714 1.0000000 0.8333333 1930
## 1940 0.8484848 0.8285714 1.0000000 0.8333333 1940
## 1950 0.8787879 0.8857143 1.0000000 0.8333333 1950
## 1960 0.9090909 0.9428571 1.0000000 0.9444444 1960
## 1970 0.9393939 0.9428571 1.0000000 1.0000000 1970
## 1980 0.9696970 0.9714286 1.0000000 1.0000000 1980
## 1990 1.0000000 1.0000000 1.0000000 1.0000000 1990
## [1] Decade Value
## <0 rows> (or 0-length row.names)
## [1] Decade Value
## <0 rows> (or 0-length row.names)
Engall has no outliers, as expected.
Now we can begin to plot the actual correlation values over time, starting with engall:
Now, let’s look at the actual magnitudes of the mac scores (rather than the Pearson correlations). To do this, we take the mean of all mac scores with a single group.
We can also do a baseline test with different groups to see if the men/women correlations are uniquely high.
Access the files containing all the data, which can be filtered in excel:
## year value group1index group2index wordterms corpus
## 1 1800 0.7608427 men women agentic engall
## 2 1810 0.7966443 men women agentic engall
## 3 1820 0.7896832 men women agentic engall
## 4 1830 0.7644530 men women agentic engall
## 5 1840 0.7750871 men women agentic engall
## 6 1850 0.7651039 men women agentic engall
## year value group1index wordterms corpus
## 1 1800 0.017776823 men agentic engall
## 2 1810 0.013167537 men agentic engall
## 3 1820 0.004411563 men agentic engall
## 4 1830 0.008945463 men agentic engall
## 5 1840 0.003948094 men agentic engall
## 6 1850 0.011462237 men agentic engall